Search CORE

49 research outputs found

Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition

Author: Fornés Alicia
Kang Lei
Riba Pau
Rusiñol Marçal
Villegas Mauricio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/05/2020
Field of study

Handwritten Text Recognition (HTR) is still a challenging problem because it must deal with two important difficulties: the variability among writing styles, and the scarcity of labelled data. To alleviate such problems, synthetic data generation and data augmentation are typically used to train HTR systems. However, training with such data produces encouraging but still inaccurate transcriptions in real words. In this paper, we propose an unsupervised writer adaptation approach that is able to automatically adjust a generic handwritten word recognizer, fully trained with synthetic fonts, towards a new incoming writer. We have experimentally validated our proposal using five different datasets, covering several challenges (i) the document source: modern and historic samples, which may involve paper degradation problems; (ii) different handwriting styles: single and multiple writer collections; and (iii) language, which involves different character combinations. Across these challenging collections, we show that our system is able to maintain its performance, thus, it provides a practical and generic approach to deal with new document collections without requiring any expensive and tedious manual annotation step.Comment: Accepted to WACV 202

arXiv.org e-Print Archive

Crossref

Handwritten Word Spotting with Corrected Attributes

Author: Almazan Jon
Fornés Alicia
Gordo Albert
Valveny Ernest
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2013
Field of study

International audienceWe propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Joint Recognition of Handwritten Text and Named Entities with a Neural End-to-end Model

Author: Carbonell Manuel
Fornés Alicia
Lladós Josep
Villegas Mauricio
Publication venue
Publication date: 22/03/2018
Field of study

When extracting information from handwritten documents, text transcription and named entity recognition are usually faced as separate subsequent tasks. This has the disadvantage that errors in the first module affect heavily the performance of the second module. In this work we propose to do both tasks jointly, using a single neural network with a common architecture used for plain text recognition. Experimentally, the work has been tested on a collection of historical marriage records. Results of experiments are presented to show the effect on the performance for different configurations: different ways of encoding the information, doing or not transfer learning and processing at text line or multi-line region level. The results are comparable to state of the art reported in the ICDAR 2017 Information Extraction competition, even though the proposed technique does not use any dictionaries, language modeling or post processing.Comment: To appear in IAPR International Workshop on Document Analysis Systems 2018 (DAS 2018

arXiv.org e-Print Archive

Crossref